Scalable Data Space Partitioning In High Dimension
نویسندگان
چکیده
A fundamental process in data mining is to approximate the joint distribution of a multivari-ate data set. A typical approach is to use histograms. However, conventional histograms (with dimension-normal data partitions) create an exponential number of partitions as the number of dimensions increases. In previous work, we presented the DataSphere method for data space partitioning and used it for several types of analysis. A DataSphere creates O(d) partitions on d-dimensional data. In this paper, we generalize DataSphere partitioning to create any polynomial number O(d k) of partitions where k ranges from 1 to d by using hyperpyramids and hyperspheres. Each of these partitioning schemes is hierarchical, allowing data cube-like roll-up and drill-down data analysis. We present several examples of how these partitioning schemes can be used, including visualization.
منابع مشابه
Efficient high dimension data clustering using constraint-partitioning k-means algorithm
With the ever-increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of h...
متن کاملEffective Spatial Data Partitioning for Scalable Query Processing
Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and running query tasks on those partitions in parallel. Therefore, effective data partitioning is critical for task parallelization, load balancing, and directly ...
متن کاملLow-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor
The approximate nearest neighbor problem ( -ANN) in Euclidean settings is a fundamental question, which has been addressed by two main approaches: Data-dependent space partitioning techniques perform well when the dimension is relatively low, but are affected by the curse of dimensionality. On the other hand, locality sensitive hashing has polynomial dependence in the dimension, sublinear query...
متن کاملA Class of Region-preserving Space Transformations for Indexing High-dimensional Data
This study introduces a class of region preserving space transformation (RPST) schemes for accessing high-dimensional data. The access methods in this class differ with respect to their spacepartitioning strategies. The study develops two new static partitioning schemes that can split each dimension of the space within linear space complexity. They also support an effective mechanism for handli...
متن کاملInteractive Rendering of Volumetric Data Sets
The bela architecture for interactive rendering of regularly structured volumetric data sets is presented. The proposed architecture is scalable and uses custom processors to achieve high-speed shading, projection. and composition of voxel primitives. A general purpose image composition network supports the accumulation of both volumetric and geometric elements into the final rendered scene. Da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999